Rare events and long-term risks
The Github repo of the first part of the course contains the html and Quarto notebooks for the code. All the academic references (paper) and many others can be accessed in the following zip file.
NOTE: The present notebook is coded in R. It relies heavily on the tidyverse ecosystem of packages. We load the tidyverse below as a prerequisite for the rest of the notebook - along with a few other libraries.
\(\rightarrow\) Don’t forget that code flows sequentially. A random chunk may not work if the previous have have not been executed.
library(tidyverse) # Package for data wrangling
library(readxl) # Package to import MS Excel files
library(latex2exp) # Package for LaTeX expressions
library(quantmod) # Package for stock data extraction
library(highcharter) # Package for reactive plots
library(roll) # Package for rolling simple operations
library(broom) # Package for grouped regressions1 Context
1.1 Market turmoil
Rare events are sometimes called Black Swans in financial economics. The most visible cases are stock market crashes. They come in several flavours:
- either abrupt, like the Black Monday of 1987, in which a market index can lose 20% in just one day;
- or more progressive like the tech-bubble crash in 2000 or the 2007-2008 financial crisis (subprime meltdown)
Let’s have a look. Below, we extract the time-series of the S&P500, one of the most important indices in US equity markets. In fact, we work with the corresponding Exchange-Traded Fund, SPY, downloaded with the {quantmod} package.
min_date <- "1990-01-01"
max_date <- "2025-01-05"
prices <- getSymbols("SPY", src = 'yahoo', # Yahoo source (ETF ticker) ^GSPC
from = min_date,
to = max_date,
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`("SPY")
highchart(type = "stock") %>%
hc_title(text = "Evolution of the SPY") %>%
hc_add_series(prices) %>%
hc_tooltip(pointFormat = '{point.y:.3f}') # To round the y-axis numbersNext, we depict the distribution of returns:
\[r_t=\frac{p_t-p_{t-1}}{p_{t-1}}\]
Note that, for simplicity, it is customary to assume that returns follow a Gaussian distribution.
returns_d <- prices %>% # Formula for returns
data.frame(Date = index(.)) %>% # Adding the Date as a column
mutate(SPY = SPY / dplyr::lag(SPY) - 1) %>% # Returns
na.omit() # Removing NA rows
m <- mean(returns_d$SPY) # Average daily return
s <- sd(returns_d$SPY) # Volatility = sd of daily returns
returns_d %>% ggplot() + # Plot
geom_histogram(aes(x = SPY, y = ..density..), bins = 100, alpha = 0.6) + theme_light() +
stat_function(fun = dnorm, args = list(mean = m, sd = s), aes(color = "Gaussian")) +
theme(legend.position = c(0.7, 0.7))And if we zoom on the left tail (very bad days!), we see the problem of Gaussian models: large negative returns basically have a zero probability of occurrence.
dens2 <- function(x){
dnorm(x, mean = m, sd = s)/mean(returns_d<(-0.02) & returns_d>(-0.12), na.rm=T) # Need to rescale
}
returns_d %>% ggplot() + # Plot
geom_histogram(aes(x = SPY, y = ..density..), bins = 100, alpha = 0.6) + theme_light() +
stat_function(fun = dens2, aes(color = "Gaussian")) +
theme(legend.position = c(0.4, 0.4)) +
xlim(-0.12,-0.02)And it is precisely because people use Gaussian models that they underestimate the likelihood of extreme risks.
For instance, if we consider daily returns, with a mean of zero (this does not change the results) and a volatility (standard deviation) of 15% annually, which means 1% daily (we divide by the square-root of 252, the number of trading days in one year in the US). Then, a return of -4% or worse has a probability of 3.2^{-5}, which means once every 125 years. But in practice, these levels occur almost every year… This is why heavy-tailed distributions, such as the Student distribution (see to the right), the stable distribution, or even the generalized hyperbolic distribution are much better choices.
dens2 <- function(x){
dnorm(x, mean = m, sd = s)/mean(returns_d<(-0.02) & returns_d>(-0.12), na.rm=T) # Need to rescale
}
dens_t <- function(x, m, s, df){dt((x-m)/s, df)/s}
dens3 <- function(x){
dens_t(x, m, s, 5)/mean(returns_d<(-0.02) & returns_d>(-0.12), na.rm=T) # Need to rescale
}
returns_d %>% ggplot() + # Plot
geom_histogram(aes(x = SPY, y = ..density..), bins = 100, alpha = 0.6) + theme_light() +
stat_function(fun = dens2, aes(color = "Gaussian")) +
stat_function(fun = dens3, aes(color = "Student (df = 5)")) +
theme(legend.position = c(0.4, 0.4)) +
xlim(-0.12,-0.02)The reasons underpinning sharp negative returns are diverse:
- overvaluation (tech bubble ~2001);
- systemic problem (subprimes ~2008);
- unforseen pandemic (Covid in 2020);
- wars & major geopolitical shocks (Ukraine in 2022).
None of these causes can (or have) been predicted by standard economic models or statistical tools.
1.2 Climate risks
Until now, we have mostly focused on the frequency of adverse events. Gaussian models underestimate this frequency, whereas the data tells us that significant risks occur more often.
Another angle to extreme risks is severity. Indeed, the total loss over a given horizon is going to be the average, i.e., the loss times its probability of occurrence. Hence, the magnitude of extreme events matters too of course.
This is where climate risks kick in. Heatwaves, droughts, floods, hurricanes, etc. All these weather-induced catastrophes have an impact on the economic activity. In the past, they were relatively rare indeed, but as climate change intensifies, they may become not only more frequent, but also more devastating.
We provide below two illustrations (see references below). First, a visual summary of disasters across the globe only in 2021.
And next, the quantification of damage in terms of human lives and infrastructure costs.
Important note: climate change amplifies risks, but is far from the only factor!
- Extreme weather impacts of climate change: an attribution perspective.
- Harbingers of decades of unnatural disasters
The link with the economy and/or financial markets is investigated or surveyed in:
2 Consumption disaster risk models
2.1 Continuous time
Here, we follow Tsai & Wachter 2015.
In financial economics, every day uncertainty is modelled (in continuous time) with Brownian motions, often written \(B_t\) - here in one dimension.
A rare event will require a jump process and the most straightforward way to proceed is with Poisson process, \(N_t\). This process has independent increments but depends on a parameter, the intensity, \(\lambda\) or \(\lambda_t\) if it is time-dependent. When \(\lambda\) is fixed, \(\mathbb{E}[N_t]=\lambda t\) and \(\mathbb{P}[N_t=n]=\frac{(\lambda t)^n}{n!}e^{-\lambda t}\).
First, we consider a state variable, which is often taken to be aggregate consumption (again!). It is assumed to be driven by the following stochastic differential equation (SDE):
\[\frac{dC_t}{C_{t-}}=\mu dt + \sigma dB_t + (e^Z-1)dN_t, \tag{1}\]
where:
- \(\mu\) is the drift, i.e., the deterministic rate of increase;
- \(\sigma\) is the every day volatility;
- \((e^Z-1)dN_t\) models the extreme events. The random variable \(Z<0\) drives the distribution of the loss and \(N_t\) its occurence and frequency;
- the notation \(t-\) simply means before the occurrence of a potential jump at time \(t\).
Next, Tsai & Wachter 2015 make a strong assumption with \(D_t=C_t^\phi\) with \(\phi >0\), i.e., dividends are a power function of consumption. Using Ito’s lemma for jump-diffusion processes1, we derive the dynamics of dividends:
\[\frac{dD_t}{D_{t-}}=\left(\phi \mu + \frac{1}{2}\phi(\phi-1)\sigma^2 \right)dt + \phi \sigma dB_t + (e^{\phi Z}-1)dN_t, \tag{2}\]
Lastly, \(S_t\) is the price of the dividend claim and it is assumed (for simplicity again) that \(S_t/D_t=c\) is constant. Hence
\[\frac{dS_t}{S_{t-}}=\mu_S dt + \phi \sigma dB_t + (e^{\phi Z}-1)dN_t, \tag{3}\]
where the drift \(\mu_S=\mu_D\).
Equation 3 defines the infinitesimal return of the asset. If we add the deterministic continuously-paid dividend yield, we have the average instantaneous average return:
\[r_S=\mathbb{E}\left[ \frac{dS_t+D_t}{S_{t-}}\right]=\mu_S +\lambda \mathbb{E}[e^{Z}-1]+c^{-1}\]
It can then be shown (see the Appendix of Tsai & Wachter 2015 for the complete proof) that the equity premium (over the risk-free rate \(r\)):
where \(\gamma\) is CRRA risk aversion (utility on consumption). During periods without disaster risk, the premium is \[r_S-r=\gamma \phi \sigma^2-\lambda \mathbb{E}[e^{-\gamma Z}(e^{\phi Z}-1)],\] which is higher because \(Z<0\). If there are no disaster risks, then it is simply \(\gamma \phi \sigma^2\).
Now, let us turn to actual numbers, i.e., rough estimates of the values in Equation 4. In the US, the (annualized) equity premium oscillates between 4% and 5% (left hand side - lhs - of Equation 4). For simplicity, most papers fix \(\phi = 1\). For \(\sigma\), Tsai & Wachter 2015 mention a value of 2%, but on recent data (1980-2022), the standard deviation is around 1% (this figure is relatively stable post-1950). If we omit the term from extreme risks (jumps), this would require an implausible level of risk aversion of \(0.04/(0.01)^2=400\).
Luckily, there is the last term. Tsai & Wachter 2015 make the following reasoning. During the 1929 Great Depression, consumption fell by 25%. Since it is a once in a century event (probably of occurrence of 1%)), this means that the expectation term in the premium is 0.25%. This is in line with the Covid crisis (also a 25% drop in consumption). A drop of 25% implies \(Z=-0.288\). With \(\gamma = 10\) and \(\lambda = 0.01\), we get that the last term is \[-0.01\times (e^{10\times0.288}-1)\times(e^{-0.288}-1)=4.2\%.\]
There are of course debates around this parametrization, but it solves the C-CAPM problem - thanks to extreme risks.
2.2 Discrete time
An anterior and similar approach is found in Rare Disasters, Asset Prices, and Welfare Costs and On the Size Distribution of Macroeconomic Disasters. Therein, the state variable is both consumption and production (they are conveniently equal), denoted with \(Y_t\) and such that \[\log(Y_{t+1})=\log(Y_{t})+g+u_{t+1}+v_{t+1},\]
where \(g\) is some exogenous productivity growth rate and:
- \(u_{t+1}\) is zero mean Gaussian iid (\(\sigma^2\) variance) and it reflects randomness in usual growth conditions;
- \(v_{t+1}\) pertains to disasters; it is equal to zero with probability \(1-p\) and to \(\log(1-b)\) with probability \(p\) (disaster with random magnitude \(b \in (0,1]\))
All in all, the average growth rate is \(g^*=g+\sigma^2/2-p \mathbb{E}[b]\). And in this case, if the utility is proportional to \(Y^{1-\gamma}\), the equity premium is
\[r_S-r = \gamma \sigma^2+p \mathbb{E}[b((1-b)^{-\gamma}-1)] \tag{5}\]
See Rare Disasters, Asset Prices, and Welfare Costs for the full proof.
In On the Size Distribution of Macroeconomic Disasters, the authors carry an estimation exercise of the random (disaster) variable \(b\).
Let us have a look at some data. The main issue here is that disasters in economic data (consumption and GDP growth) are actually rare. For instance, in the US, strong contactions (more severe than -10%) occur at most every decade. Hence, to build a distribution, one country will not be enough: we need more.
library(WDI) # Package that accesses World Bank data
wb_data <- WDI( # World Bank data
indicator = c("gdp" = "NY.GDP.MKTP.CD",
"stock" = "CM.MKT.INDX.ZG",
"cap" = "CM.MKT.LCAP.CD"),
extra = T,
start = 1960,
end = 2022)
gdp_data <- wb_data |>
filter(is.finite(gdp)) |>
group_by(country) |>
mutate(b = 1-gdp/dplyr::lag(gdp),
gdp_growth = -b) |>
ungroup() |>
filter(gdp_growth < 5)Defining disasters is not obvious. Peak-to-trough (maximum drawdowns) are one option. Above, we looked at annual growths and selected at most one observation per calendar decade that was below a -10% contraction. Because we have a large panel of countries, this generates enough points.
disasters <- gdp_data |>
filter(gdp_growth < -0.1) |>
mutate(yea = substr(year, 1,3)) |> # Only one per calendar decade
na.omit()
disasters <- disasters[!duplicated(disasters |> dplyr::select(country,yea)),]
# temp <- tempfile()
# link <- "https://raw.githubusercontent.com/shokru/carbon_emissions/main/country_codes.csv"
# download.file(link, temp, quiet = TRUE)
# codes <- read_csv(temp)
disasters <- disasters |>
#filter(iso3c %in% codes$`alpha-3`) |>
mutate(transformed = 1/(1-b))
disasters |> head()| country | iso2c | iso3c | year | status | lastupdated | gdp | stock | cap | region | capital | longitude | latitude | income | lending | b | gdp_growth | yea | transformed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Argentina | AR | ARG | 2003 | 2024-12-16 | 127586973492 | 131.3899994 | 34994620000 | Latin America & Caribbean | Buenos Aires | -58.4173 | -34.6118 | Upper middle income | IBRD | 0.4423132 | -0.4423132 | 200 | 1.793121 | |
| Argentina | AR | ARG | 2020 | 2024-12-16 | 385740508437 | 123.2113992 | 32250420000 | Latin America & Caribbean | Buenos Aires | -58.4173 | -34.6118 | Upper middle income | IBRD | 0.3514233 | -0.3514233 | 202 | 1.541838 | |
| Australia | AU | AUS | 1994 | 2024-12-16 | 322806641301 | 0.4104552 | 218865190000 | East Asia & Pacific | Canberra | 149.129 | -35.282 | High income | Not classified | 0.6224517 | -0.6224517 | 199 | 2.648668 | |
| Australia | AU | AUS | 2002 | 2024-12-16 | 395788696012 | -2.1270626 | 380087000000 | East Asia & Pacific | Canberra | 149.129 | -35.282 | High income | Not classified | 0.7071044 | -0.7071044 | 200 | 3.414186 | |
| Austria | AT | AUT | 2001 | 2024-12-16 | 196477206829 | -5.3361815 | 25204350000 | Europe & Central Asia | Vienna | 16.3798 | 48.2201 | High income | Not classified | 0.5394135 | -0.5394135 | 200 | 2.171145 | |
| Austria | AT | AUT | 1996 | 2024-12-16 | 235952581232 | 2.1600212 | 31914600000 | Europe & Central Asia | Vienna | 16.3798 | 48.2201 | High income | Not classified | 0.3899253 | -0.3899253 | 199 | 1.639144 |
We notice some strange numbers, e.g., a -14% disaster for France in 2015.
Any idea where this comes from?
\(\rightarrow\) the EUR/USD change! Maybe it would make more sense to keep GDP data in local currency.
power_fun <- function(x, a) {
x^(-a)*(a-1)
}
disasters |>
filter(gdp_growth < -0.1, transformed > 1.05, transformed < 4) |>
ggplot(aes(x = transformed)) + geom_histogram(aes(y = ..density..), fill = "#2299FF", alpha = 0.6) +
theme_classic() +
geom_function(fun = power_fun, args = list(a = 6.86+1))Let’s compare with the original plot from the paper (same power function depicted). It’s quite close, though we have virtually zero points for transformed values above 2. We consider the left bound/threshold to be \(z=1\).
Choosing a power law that starts at \(z=1\) allows to obtain a very simple expression for Equation 5:
\[r_S-r = \gamma \sigma^2+p \left(\frac{\alpha}{\alpha-\gamma}-\frac{\alpha}{\alpha+1-\gamma}+\frac{\alpha}{\alpha+1}-1 \right)\]
Equipped, with this formula, we can try to figure out how to reach a 4% premium - shown with the horizontal grey dashed line in the graph on the right.
Below, we assume a frequency of disaster of \(p=0.03\) and a risk aversion of \(\gamma \in \{4,5\}\) and a volatility of GDP growth of 2%.
Plainly, the level of risk aversion has to be linked with the power index.
premium <- function(a, gamma, s, p){
gamma*s^2+p*(a/(a-gamma)-a/(a+1-gamma)+a/(a+1)-1)
}
data.frame(a=c(6,8)) |>
ggplot(aes(x = a)) + theme_classic() +
geom_function(fun = premium, args = list(gamma = 5, s = 0.02, p = 0.03), aes(color = "gamma = 5")) +
geom_function(fun = premium, args = list(gamma = 4, s = 0.02, p = 0.03), aes(color = "gamma = 4")) +
geom_hline(yintercept = 0.04, linetype = 2, color = "#999999") + xlab("power index") +
theme(axis.title.y = element_blank(),
legend.title = element_blank(),
legend.position = c(0.7,0.7)) 3 Climate disaster
What follows is taken from the paper Climate change risk by Bansal, Kiku and Ochoa.
3.1 The economy
The economy is modelled such that consumption growth is \[\Delta c_{t+1}=\log(C_{t+1}/C_t)=\mu+\sigma_\eta \eta_{t+1}+D_{t+1}, \tag{6}\]
where \(\eta_{t+1}\) is consumption growth shock and \(D_{t+1}= N_{t+1}d\) climate damage, modelled as the product between a Poisson process that counts the number (occurrences) of climate events and \(d<0\) the corresponding decline in consumption growth. The intensity of the Poisson process is increasing with the temperature anomaly \(T_t\): \[\pi_t = l_0+l_1T_t, \quad l_0,l_1>0.\]
In order to loop the loop, temperature is tied with economic activity: \(T_{t+1}=\chi E_{t+1}\), with \(E_{t+1}\) being the total carbon emissions following the autoregressive dynamics: \[E_{t+1}=\nu E_{t}+\iota (\mu +\sigma_n\eta_{t+1})+ \sigma_\zeta \zeta_{t+1},\] where \(\chi>0\) is the sensitivity of temperature to emissions, \(\iota>0\) is the carbon intensity of consumption and \(\nu \in (0,1)\) is a persistence parameter. \(\zeta_t\) serves as exogenous innovation.
3.2 The representative agent
The utility is modelled with Epstein-Zin preferences:
\[U_t=\left((1-\delta)C_t^{1-\psi^{-1}}+\delta \mathbb{E}_t\left[U_{t+1}^{1-\gamma}\right]^{(1-\psi^{-1})/(1-\gamma)} \right)^{1/(1-\psi^{-1})}\] In this case, the stochastic discount factor is \[m_{t+1}=\theta \log(\delta))-\frac{\theta}{\psi}\Delta c_{t+1}+(\theta-1)r_{c,t+1}, \quad \theta = \frac{1-\gamma}{1-\psi^{-1}}\]
where \(r_{c,t+1}\) is the return on wealth.
Now, we also have individual stocks, indexed by \(i\) with dividend growth subject to \[\Delta d_{i,t+1}=\phi_i\Delta_{c,t+1}+\sigma u_{i,t+1}\] where \(\Delta c\) is given by Equation 6 and \(u_{i,t}\) is a Gaussian white noise. It can then be shown (though it is long and we refer to the article for details) that the excess conditional return of asset \(i\) is
\[\mathbb{E}_t[\log(r_{i,t+1})]-r_f=\beta_{i,\eta} \lambda_\eta+\sigma^2_\eta+\beta_{i,\zeta}\lambda_\zeta\sigma_\zeta^2+\beta_{i,D}\lambda_D(l_0+l_1T_t)\]
Recall that \(\sigma_\zeta^2\) is the volatility of emissions and \(\sigma^2_\eta\) is the variability of consumption growth that is unrelated to climate risk.
In the paper, the authors empirically test the predictions of the model by running the following regressions:
\[r_{i,t,k}=a_i+\beta_{i,T}\Delta_k T_t+\beta_{i,m}r_{m,t,k}+\beta_{i,c}\Delta_kc_t+e_{i,t,k},\]
where the subscript refers to \(k\) periods. \(r_{m,t,k}\) is the excess return on the aggregate stock market and serves as major control variable.
3.3 Data
First, we set the horizon.
k <- 3The gathering of all variables requires several sources.
We start with a source that lists and US states.
states <- read_csv("https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv")
states <- states |> filter(State != "District of Columbia") |> pull(State)We then fetch the data for mainland temperature anomalies and format the result (we exclude Alaska and Hawaii). It comes from the National Centers for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA).
data_temp <- c()
for(j in 1:48){
data_temp <- data_temp |> bind_rows(
read_csv(paste0("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/statewide/time-series/",
j,
"/tavg/all/4/1895-2024.csv?base_prd=true&begbaseyear=1895&endbaseyear=1945"),
skip = 4) |>
mutate(state = states[j] |> as.character())
)
}
data_temp <- data_temp |>
rename(date = Date) |>
mutate(date = make_date(year = substr(date, 1, 4) |> as.numeric(),
month = substr(date, 5, 6) |> as.numeric(),
day = 28)) |>
group_by(date) |>
mutate(anom = mean(Anomaly)) |>
ungroup() |>
filter(date > "1945-01-01")
data_temp |> head()| date | Value | Anomaly | state | anom |
|---|---|---|---|---|
| 1945-01-28 | 44.8 | -1.0 | Alabama | -1.268750 |
| 1945-02-28 | 51.4 | 3.7 | Alabama | 2.450000 |
| 1945-03-28 | 63.4 | 7.8 | Alabama | 6.466667 |
| 1945-04-28 | 65.4 | 2.5 | Alabama | 1.079167 |
| 1945-05-28 | 68.4 | -2.7 | Alabama | -2.462500 |
| 1945-06-28 | 78.2 | -0.1 | Alabama | -1.912500 |
We then keep only one aggregated series.
data_temp <- data_temp |>
filter(state == "Alabama") |>
select(date, anom) |>
mutate(ym = paste0(year(date), month(date)),
d_anom = anom/lag(anom, k) - 1)For the aggregate stock market returns, there are several possible choices, like Ken French’s data library. Below, we recycle the data from the previous exercise.
# y <- getSymbols("SPY", src = 'yahoo', # Data from Yahoo Finance
# from = min_date, # Start date
# to = max_date, # End date
# auto.assign = TRUE,
# warnings = FALSE) %>%
# map(~Ad(get(.))) %>% # Retrieving the data
# reduce(merge)
spy <- SPY %>% # Start from prices
to.monthly(indexAt = "lastof", OHLC = FALSE) %>% # Convert to monthly
data.frame(Date = index(.)) %>% # Convert the index to a date
remove_rownames() |>
select(Date, SPY.Close) |>
rename(date = 1, spy = 2) |>
mutate(ym = paste0(year(date), month(date)),
spy = spy/lag(spy, k) - 1) |>
na.omit()
spy |> head()| date | spy | ym | |
|---|---|---|---|
| 4 | 1993-04-30 | 0.0021337 | 19934 |
| 5 | 1993-05-31 | 0.0182970 | 19935 |
| 6 | 1993-06-30 | -0.0027663 | 19936 |
| 7 | 1993-07-31 | 0.0184528 | 19937 |
| 8 | 1993-08-31 | 0.0297167 | 19938 |
| 9 | 1993-09-30 | 0.0194175 | 19939 |
Next, consumption data (from previous session).
temp <- tempfile()
link <- "https://fred.stlouisfed.org/graph/fredgraph.xls?id=PCE"
download.file(link, temp, quiet = TRUE)
cons_data <- read_excel(temp, sheet = 2)
link <- "https://fred.stlouisfed.org/graph/fredgraph.xls?id=PCEND"
download.file(link, temp, quiet = TRUE)
cons_data <- cons_data |>
left_join(read_excel(temp, sheet = 2), by = "observation_date") |>
mutate(observation_date = as.Date(observation_date),
observation_date = observation_date - 1 # Last day of the month
) |>
rename(date = observation_date) |>
mutate(ym = paste0(year(date), month(date)),
D_pce = PCE/lag(PCE) - 1) |>
filter(date > min_date)
cons_data |> head()| date | PCE | PCEND | ym | D_pce |
|---|---|---|---|---|
| 1990-01-31 | 3728.2 | 975.7 | 19901 | -0.0006701 |
| 1990-02-28 | 3754.9 | 981.5 | 19902 | 0.0071616 |
| 1990-03-31 | 3770.0 | 976.5 | 19903 | 0.0040214 |
| 1990-04-30 | 3775.8 | 977.4 | 19904 | 0.0015385 |
| 1990-05-31 | 3804.5 | 988.6 | 19905 | 0.0076010 |
| 1990-06-30 | 3821.7 | 991.1 | 19906 | 0.0045210 |
Finally, let’s find some individual stock data to run the regressions.
tickers <- c("AAPL", "BA", "C", "CAT", "CVX", "DIS", "F", "GE", "MCD", "MMC",
"MSFT", "PFE", "WMT", "XOM")
stocks <- getSymbols(tickers, src = 'yahoo', # Data from Yahoo Finance
from = min_date, # Start date
to = max_date, # End date
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>% # Retrieving the data
reduce(merge) %>% # Merge in one dataframe
`colnames<-`(tickers) # Set the column names
stocks |> head() # Have a look at the result! AAPL BA C CAT CVX DIS F
1990-01-02 0.2621294 11.04363 11.37119 3.208576 4.690282 6.883252 2.272356
1990-01-03 0.2638886 11.31298 11.51635 3.242635 4.613950 6.898149 2.284911
1990-01-04 0.2647689 11.26809 11.37119 3.256261 4.554578 6.890698 2.284911
1990-01-05 0.2656485 11.11097 11.46796 3.235824 4.486725 6.913046 2.253524
1990-01-08 0.2674076 11.24565 11.56474 3.215387 4.529133 6.972406 2.253524
1990-01-09 0.2647689 11.11097 11.41958 3.194950 4.495208 6.964950 2.215862
GE MCD MMC MSFT PFE WMT XOM
1990-01-02 11.39443 4.387938 5.429225 0.3792593 0.9492189 1.194632 3.866511
1990-01-03 11.37310 4.340756 5.394423 0.3813966 0.9525614 1.194632 3.827848
1990-01-04 11.30908 4.262119 5.342221 0.3926139 0.9676015 1.188294 3.789182
1990-01-05 11.20239 4.183481 5.227493 0.3829989 0.9592459 1.175619 3.769849
1990-01-08 11.26640 4.262119 5.192644 0.3888744 0.9508892 1.191463 3.827848
1990-01-09 11.03169 4.230666 5.297192 0.3878061 0.9308360 1.159775 3.750517
rm(list = tickers) # Remove unnecessary variables from the env.
returns_m <- stocks %>% # Start from prices
to.monthly(indexAt = "lastof", OHLC = FALSE) %>% # Convert to monthly
data.frame(Date = index(.)) %>% # Convert the index to a date
remove_rownames() %>% # Remove index => converted into row names
pivot_longer(-Date, names_to = "Asset",
values_to = "Price") %>% # Put the data in 'tidy' format
group_by(Asset) %>% # Group the rows by asset to compute returns
mutate(return = Price / dplyr::lag(Price, k) - 1, # Compute the returns
return = return |> round(4)) %>% # To ease presentation
select(-Price) %>% # Remove price column
na.omit() # Discard rows with missing/NA dataLastly, we aggregate all the variables together.
data <- returns_m |>
mutate(ym = paste0(year(Date), month(Date))) |>
left_join(cons_data, by = "ym") |>
left_join(spy, by = "ym") |>
left_join(data_temp, by = "ym") |>
select(Date, Asset, return, D_pce, spy, d_anom) |>
na.omit()
data |> head()| Date | Asset | return | D_pce | spy | d_anom |
|---|---|---|---|---|---|
| 1993-04-30 | AAPL | -0.1368 | 0.0066008 | 0.0021337 | -1.524778 |
| 1993-04-30 | BA | 0.0856 | 0.0066008 | 0.0021337 | -1.524778 |
| 1993-04-30 | C | 0.3046 | 0.0066008 | 0.0021337 | -1.524778 |
| 1993-04-30 | CAT | 0.2368 | 0.0066008 | 0.0021337 | -1.524778 |
| 1993-04-30 | CVX | 0.1979 | 0.0066008 | 0.0021337 | -1.524778 |
| 1993-04-30 | DIS | -0.1056 | 0.0066008 | 0.0021337 | -1.524778 |
3.4 Results
We can then proceed with the regressions. We group across stocks thanks to the {broom} package.
models <- data |>
group_by(Asset) |>
group_modify(.f = ~ broom::tidy(lm(return ~ spy + D_pce + d_anom,
data = .x)))
models |> head(9)| Asset | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|
| AAPL | (Intercept) | 0.0461960 | 0.0114326 | 4.0407087 | 0.0000647 |
| AAPL | spy | 1.2839437 | 0.1442959 | 8.8979925 | 0.0000000 |
| AAPL | D_pce | 0.0591684 | 1.0586790 | 0.0558889 | 0.9554600 |
| AAPL | d_anom | 0.0000551 | 0.0004475 | 0.1231795 | 0.9020309 |
| BA | (Intercept) | 0.0080274 | 0.0076397 | 1.0507467 | 0.2940516 |
| BA | spy | 1.1808174 | 0.0964241 | 12.2460805 | 0.0000000 |
| BA | D_pce | -0.0560003 | 0.7074504 | -0.0791579 | 0.9369492 |
| BA | d_anom | 0.0001460 | 0.0002990 | 0.4883169 | 0.6256108 |
| C | (Intercept) | -0.0141572 | 0.0075351 | -1.8788423 | 0.0610415 |
The interesting coefficient is the one related to temperature changes.
Exercise: when \(k>1\), the variables are autocorrelated. Hence, HAC estimators for the standard errors are better choices… see the {sandwich} package.
models |>
filter(term == "d_anom") |>
ggplot(aes(x = estimate,
y = reorder(Asset, estimate),
fill = if_else(p.value < 0.05, "signif.", "non signif."))) + geom_col() +
theme_classic() +
theme(axis.title = element_blank(),
legend.position = c(0.7,0.3),
legend.title = element_blank())In the original paper, the authors use portfolio (aggregate) returns in the lhs of the equation.
They find mostly negative coefficients, with varying degrees of significance:
The numbers are those of the last 2 columns.
3.5 Asset pricing with crises
Inspired from Time-varying rare disaster risk and stock returns, we fetch data from the International Crisis Behavior Project.
This is a purely empirical contribution, there is no theoretical model that backs the regression specifications.
In the paper, the authors test the link between global returns and crises: \[r_t=a + b c_t+e_t,\] where \(c_t\) is the number of crises during the period.
crises <- read.csv("icb1v15.csv")
crises <- crises |>
select(crisname, yrtrig, motrig, yrterm, moterm)
crises |> head(6)| crisname | yrtrig | motrig | yrterm | moterm |
|---|---|---|---|---|
| RUSSIAN CIVIL WAR I | 1918 | 5 | 1920 | 4 |
| COSTA RICAN COUP | 1918 | 5 | 1919 | 9 |
| RUSSIAN CIVIL WAR II | 1918 | 6 | 1919 | 9 |
| BALTIC INDEPENDENCE | 1918 | 11 | 1920 | 8 |
| TESCHEN | 1919 | 1 | 1920 | 7 |
| HUNGARIAN WAR | 1919 | 3 | 1919 | 8 |
crises <- crises |> filter(yrtrig > 1980) # Keep only the recent crisesNow, we have to translate this data into a column that counts the number of crises.
counting <- function(start_end, dates){
start <- substr(start_end, 1, 6) |> as.numeric()
end <- substr(start_end, 8, 13) |> as.numeric()
tibble(date = as.Date(dates)) |>
mutate(y = year(date),
m = month(date),
m = if_else(nchar(m)==2, as.character(m), paste0("0",m)), # Add "0" if need be
ym = paste0(y, m) |> as.numeric(),
crisis = (ym >= start) * (ym <= end))
}
counting("199307_199401", spy$date) |> head(12) # The format is being_end of crisis| date | y | m | ym | crisis |
|---|---|---|---|---|
| 1993-04-30 | 1993 | 04 | 199304 | 0 |
| 1993-05-31 | 1993 | 05 | 199305 | 0 |
| 1993-06-30 | 1993 | 06 | 199306 | 0 |
| 1993-07-31 | 1993 | 07 | 199307 | 1 |
| 1993-08-31 | 1993 | 08 | 199308 | 1 |
| 1993-09-30 | 1993 | 09 | 199309 | 1 |
| 1993-10-31 | 1993 | 10 | 199310 | 1 |
| 1993-11-30 | 1993 | 11 | 199311 | 1 |
| 1993-12-31 | 1993 | 12 | 199312 | 1 |
| 1994-01-31 | 1994 | 01 | 199401 | 1 |
| 1994-02-28 | 1994 | 02 | 199402 | 0 |
| 1994-03-31 | 1994 | 03 | 199403 | 0 |
counting_col <- function(start_end, dates){
counting(start_end, dates) |> pull(crisis)
}Okay, so we just need the correct input and to loop on all crises.
To do so, we will resort to functional programming, with the map() function from the {purrr} package.
We want to stack the output by columns, hence we use the map_dfc() alternative.
crises <- crises |>
mutate(motrig = if_else(nchar(motrig)==2, as.character(motrig), paste0("0", motrig)), # Add zeros
moterm = if_else(nchar(moterm)==2, as.character(moterm), paste0("0", moterm)),
start_end = paste0(yrtrig, motrig, "_", yrterm, moterm))
head(crises, 5)| crisname | yrtrig | motrig | yrterm | moterm | start_end |
|---|---|---|---|---|---|
| CHAD/LIBYA V | 1981 | 01 | 1981 | 11 | 198101_198111 |
| ECUADOR/PERU BDR. III | 1981 | 01 | 1981 | 04 | 198101_198104 |
| MOZAMBIQUE RAID | 1981 | 01 | 1981 | 03 | 198101_198103 |
| IRAQ NUCLEAR REACTOR | 1981 | 01 | 1981 | 06 | 198101_198106 |
| ESSEQUIBO II | 1981 | 04 | 1983 | 03 | 198104_198303 |
crisis_mat <- map_dfc(crises$start_end, counting_col, dates = spy$date) All we ned to do is sum the rows of the matrix.
crises <- tibble(date = spy$date,
n = crisis_mat |> rowSums(na.rm = T)) |>
mutate(year = year(date),
month = month(date))
crises |> ggplot(aes(x = date, y = n)) + geom_line() +
theme_light() + theme(axis.title = element_blank())First, let’s see what happens for a local economy, the US.
We use the {broom} package for coefficient (tidy() function) and fit (glance() function) output.
data_crises <- spy |>
mutate(year = year(date),
month = month(date)) |>
left_join(crises |> select(-date), # Remove date which will be duplicate
by = c("year", "month")) |>
select(date, spy, n) |>
mutate(n = if_else(is.na(n), 0, n)) |>
filter(year(date) < 2022)
fit <- lm(spy ~ n, data = data_crises)
fit |> tidy()| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.0391420 | 0.0076716 | 5.102208 | 0.0000006 |
| n | -0.0080063 | 0.0033364 | -2.399665 | 0.0169429 |
fit |> glance()| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.0165111 | 0.0136438 | 0.0727781 | 5.758393 | 0.0169429 | 1 | 415.4866 | -824.9733 | -813.4426 | 1.81675 | 343 | 345 |
The signs are the same as in Time-varying rare disaster risk and stock returns (positive intercept, negative coefficient), but only the intercept has a negligible \(p\)-value. And the \(R^2\) is very small.
Next, let’s try for a synthetic global asset (world market). The data is sampled yearly.
And there are months with zero crisis as well.
fit <- lm(return ~ n,
data = wb_data |>
filter(is.finite(cap), is.finite(stock)) |>
group_by(year) |>
summarise(return = sum(cap*stock/100)/sum(cap)) |>
left_join(crises |> group_by(year) |> summarise(n = max(n, na.rm = T)),
by = "year") |> na.omit()
)
fit |> tidy()| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.0970072 | 0.0804463 | 1.2058641 | 0.2379613 |
| n | -0.0022074 | 0.0245169 | -0.0900363 | 0.9288994 |
fit |> glance()| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.0002894 | -0.0354145 | 0.1882645 | 0.0081065 | 0.9288994 | 1 | 8.563955 | -11.12791 | -6.924319 | 0.9924188 | 28 | 30 |
Again, a negative coefficient, but not statistical significance (smaller sample size?) and negative adjusted \(R^2\).
It would be interesting to investigate lead-lag effects. This is left as exercise.
Footnotes
See Proposition 20.15 in Stochastic calculus for jump processes.↩︎